30 research outputs found

    Towards plant pangenomics

    Get PDF
    As an increasing number of genome sequences become available for a wide range of species, there is a growing understanding that the genome of a single individual is insufficient to represent the gene diversity within a whole species. Many studies examine the sequence diversity within genes, and this allelic variation is an important source of phenotypic variation which can be selected for by man or nature. However, the significant gene presence/absence variation that has been observed within species and the impact of this variation on traits is only now being studied in detail. The sum of the genes for a species is termed the pangenome, and the determination and characterization of the pangenome is a requirement to understand variation within a species. In this review, we explore the current progress in pangenomics as well as methods and approaches for the characterization of pangenomes for a wide range of plant species

    Grain dispersal mechanism in cereals arose from a genome duplication followed by changes in spatial expression of genes involved in pollen development

    Get PDF
    KEY MESSAGE: Grain disarticulation in wild progenitor of wheat and barley evolved through a local duplication event followed by neo-functionalization resulting from changes in location of gene expression. ABSTRACT: One of the most critical events in the process of cereal domestication was the loss of the natural mode of grain dispersal. Grain dispersal in barley is controlled by two major genes, Btr1 and Btr2, which affect the thickness of cell walls around the disarticulation zone. The barley genome also encodes Btr1-like and Btr2-like genes, which have been shown to be the ancestral copies. While Btr and Btr-like genes are non-redundant, the biological function of Btr-like genes is unknown. We explored the potential biological role of the Btr-like genes by surveying their expression profile across 212 publicly available transcriptome datasets representing diverse organs, developmental stages and stress conditions. We found that Btr1-like and Btr2-like are expressed exclusively in immature anther samples throughout Prophase I of meiosis within the meiocyte. The similar and restricted expression profile of these two genes suggests they are involved in a common biological function. Further analysis revealed 141 genes co-expressed with Btr1-like and 122 genes co-expressed with Btr2-like, with 105 genes in common, supporting Btr-like genes involvement in a shared molecular pathway. We hypothesize that the Btr-like genes play a crucial role in pollen development by facilitating the formation of the callose wall around the meiocyte or in the secretion of callase by the tapetum. Our data suggest that Btr genes retained an ancestral function in cell wall modification and gained a new role in grain dispersal due to changes in their spatial expression becoming spike specific after gene duplication. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-022-04029-8

    The pangenome of hexaploid bread wheat

    Get PDF
    There is an increasing understanding that variation in gene presence–absence plays an important role in the heritability of agronomic traits; however, there have been relatively few studies on variation in gene pres- ence–absence in crop species. Hexaploid wheat is one of the most important food crops in the world and intensive breeding has reduced the genetic diversity of elite cultivars. Major efforts have produced draft genome assemblies for the cultivar Chinese Spring, but it is unknown how well this represents the genome diversity found in current modern elite cultivars. In this study we build an improved reference for Chinese Spring and explore gene diversity across 18 wheat cultivars. We predict a pangenome size of 140 500 102 genes, a core genome of 81 070 1631 genes and an average of 128 656 genes in each cultivar. Functional annotation of the variable gene set suggests that it is enriched for genes that may be associated with important agronomic traits. In addition to variation in gene presence, more than 36 million intervarietal sin- gle nucleotide polymorphisms were identified across the pangenome. This study of the wheat pangenome provides insight into genome diversity in elite wheat as a basis for genomics-based improvement of this important crop. A wheat pangenome, GBrowse, is available at http://appliedbioinformatics.com.au/cgi-bin/ gb2/gbrowse/WheatPan/, and data are available to download from http://wheatgenome.info/wheat_ge nome_databases.php

    Assembly and comparison of two closely related Brassica napus genomes

    Get PDF
    As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome

    An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome

    Get PDF
    Background: Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. Results: The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. Conclusions: The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration

    An efficient approach to BAC based assembly of complex genomes

    Get PDF
    Background: There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate ‘gold’ reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. Results: We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. Conclusions: We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes

    The giant diploid faba genome unlocks variation in a global protein crop

    Get PDF
    Publisher Copyright: © 2023, The Author(s).Increasing the proportion of locally produced plant protein in currently meat-rich diets could substantially reduce greenhouse gas emissions and loss of biodiversity1. However, plant protein production is hampered by the lack of a cool-season legume equivalent to soybean in agronomic value2. Faba bean (Vicia faba L.) has a high yield potential and is well suited for cultivation in temperate regions, but genomic resources are scarce. Here, we report a high-quality chromosome-scale assembly of the faba bean genome and show that it has expanded to a massive 13 Gb in size through an imbalance between the rates of amplification and elimination of retrotransposons and satellite repeats. Genes and recombination events are evenly dispersed across chromosomes and the gene space is remarkably compact considering the genome size, although with substantial copy number variation driven by tandem duplication. Demonstrating practical application of the genome sequence, we develop a targeted genotyping assay and use high-resolution genome-wide association analysis to dissect the genetic basis of seed size and hilum colour. The resources presented constitute a genomics-based breeding platform for faba bean, enabling breeders and geneticists to accelerate the improvement of sustainable protein production across the Mediterranean, subtropical and northern temperate agroecological zones.Peer reviewe

    Skim-based genotyping by sequencing

    No full text
    Genotyping by sequencing (GBS) is a relatively new method used to determine the differences in the genetic makeup of individuals. Its novelty stems from a combination of two already available methods: genotyping and next-generation sequencing. Depending on the individual study design GBS protocols can take multiple forms, however most share a sequence of core steps that have to be undertaken. These include: sequencing of the DNA from the individuals of interest (usually two parents of a mapping population and their progeny), mapping of the sequencing reads to the reference sequence, SNP calling and filtering, SNP genotyping and imputation, followed by haplotype identification and downstream analysis. GBS has a range of applications from general marker discovery, haplotype identification, and recombination characterization to quantitative trait locus (QTL) analysis, genome-wide association studies (GWAS), and genomic selection (GS). It has already been applied to a range of plant species including: rice, maize, artichoke, and Arabidopsis thaliana. It is a promising approach which is likely to provide new and important insights into plant biology

    Genome-wide analysis of the Hsf gene family in Brassica oleracea and a comparative analysis of the Hsf gene family in B. oleracea, B. rapa and B. napus

    No full text
    The global climate change-induced abiotic and biotic stresses are predicted to affect crop-growing seasons and crop yield. Heat stress transcription factors (Hsfs) have been suggested to play a significant role in various stress responses. They are an integral part of the signal transduction pathways that operate in response to environmental stresses. Brassica oleracea is one of the agronomical important crop species which consists of cabbage, cauliflower, broccoli, Brussels sprout, kohlrabi and kale. The identification and roles of Hsfs in this important Brassica species are unknown. The availability of whole genome sequence of B. oleracea provides us an opportunity for performing in silico analysis of Hsf genes in B. oleracea. Thirty-five putative genes encoding Hsf proteins were identified and classified into A, B and C classes. Their evolution, physical location, gene structure, domain structure and tissue-specific expression patterns were investigated. Further, a comparative analysis of the Hsf gene family in B. oleracea, B. rapa and B. napus highlighted the role of hybridisation and allopolyploidy in the evolution of the largest known Hsf gene family in B. napus. The presence of orthologous gene clusters, found in Brassica species, but not in A. thaliana, suggested that polyploidisation has resulted in the formation of new Brassica-specific orthologous gene clusters. Gene duplication analysis indicated that the evolution of the Hsf gene family was under strong purifying selection in these Brassica species. High-level synteny was observed within the B. napus genome. Conservation of physical location, the similarity of structure and similar expression profiles between the B. napus Hsf genes and the corresponding genes from B. oleracea and B. rapa suggest a high functional similarity between these genes. This study paves the way for further investigation of Hsf genes in improving stress tolerance in B. oleracea. The genes thus identified may be useful for developing crop varieties resilient to the global climate change
    corecore